Using Cognates in a French-Romanian Lexical Alignment System: A Comparative Study
نویسندگان
چکیده
This paper describes a hybrid French Romanian cognate identification module. This module is used by a lexical alignment system. Our cognate identification method uses lemmatized, tagged and sentence-aligned parallel corpora. This method combines statistical techniques, linguistic information (lemmas, POS tags) and orthographic adjustments. We evaluate our cognate identification module and we compare it to other methods using pure statistical techniques. Thus, we study the impact of the used linguistic information and the orthographic adjustments on the results of the cognate identification module and on cognate alignment. Our method obtains the best results in comparison with the other implemented statistical methods.
منابع مشابه
A Dictionary-Based Approach for Evaluating Orthographic Methods in Cognates Identification
In this paper we propose a method for identifying cognates based on etymology and etymons. We employ this approach to evaluate the extent to which lexical similarity can be used for automatic detection of cognate pairs. We investigate some orthographic approaches widely used in this research area and some original metrics as well. We apply this procedure for Romanian and its most closely relate...
متن کاملMental Representation of Cognates/Noncognates in Persian-Speaking EFL Learners
The purpose of this study was to investigate the mental representation of cognate and noncognate translation pairs in languages with different scripts to test the prediction of dual lexicon model (Gollan, Forster, & Frost, 1997). Two groups of Persian-speaking English language learners were tested on cognate and noncognate translation pairs in Persian-English and English-Persian directions with...
متن کاملWord Transformation Heuristics Agains Lexicons for Cognate Detection
One of the most common lexical transformations between cognates in French and English is the presence or absence of a terminal “e”. However, many other transformations exist, such as a vowel with a circumflex corresponding to the vowel and the letter s. Our algorithms tested the effectiveness of taking the entire English and French lexicons from Treetagger, deaccenting the French lexicon, and t...
متن کاملBrain activation and lexical learning: The impact of learning phase and word type
This study investigated the neural correlates of second-language lexical acquisition in terms of learning phase and word type. Ten French-speaking participants learned 80 Spanish words-40 cognates, 40 non-cognates-by means of a computer program. The learning process included the early learning phase, which comprised 5 days, and the consolidation phase, which lasted 2 weeks. After each phase, pa...
متن کاملBuilding a Dataset of Multilingual Cognates for the Romanian Lexicon
Identifying cognates is an interesting task with applications in numerous research areas, such as historical and comparative linguistics, language acquisition, cross-lingual information retrieval, readability and machine translation. We propose a dictionary-based approach to identifying cognates based on etymology and etymons. We account for relationships between languages and we extract etymol...
متن کامل